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Statement  of  Problem 


It  is  well  known  that  when  sets  of  dummy  variables  are  included  in 
a  regression  function,  multicollinearity  with  the  constant  terra  prevents 
the  least-squares  estimation  of  the  coefficients.  The  common  practice  to 
circumvent  this  difficulty  is  to  exclude  any  one  of  the  dummy  variable  in 
a  set,  and  then  proceed  merrily  with  the  estimation.  However,  one  might 
be  interested  in  comparing  the  relative  importance  of  all  the  dummy  vari- 
ables in  a  set  for  the  prediction.  Therefore,  the  question  arises  as  to 
the  estimation  of  the  coefficient  of  the  excluded  dummy  variable  and  the 
different  effect  of  excluding  one  or  the  other  dummy  variable  in  the  set. 

The  objective  of  this  paper  is  to  review  the  known  facts  on  using  one 
set  of  dummy  variables  in  regression  analysis,  and  investigate  the  advan- 
tage of  applying  the  principal  components  technique  to  the  general  case 
of  more  than  one  set  of  dummy  variables.  Two  empirical  examples  will  be 
used  to  illustrate  the  behavior  of  least-squares  estimates  of  the  coeffi- 
cients of  dummy  variables  for  various  specifications  of  a  function. 

The  Simple  Case 

In  the  simple  case  of  only  one  set  of  dummy  variables  with  other  in- 
dependent variables  in  the  equation,  these  problems  have  been  analyzed. 
The  approach  used  is  to  exclude  the  constant  term  from  the  original  speci- 
fication to  allow  estimation  of  the  coefficients  of  all  the  dummy  variables 


Goldberger,  Arthur  S.   Econometric  Theory  (New  York:  John  Wiley  and 
Sons,  Inc.,  1964),  pp.  218-227. 


-2- 

in  the  set.  Then  any  one  dummy  variable  is  omitted  in  the  second  speci- 
fication, and  the  constant  term  included.  A  comparison  of  the  least-squares 
estimates  of  the  coefficients  of  the  regressors  in  the  second  specification 
with  those  in  the  original  reveals  the  following  facts,  the  proofs  of  which 
are  given  in  Appendix  I  for  completeness. 

1.  The  constant  term  is  the  coefficient  of  the  dummy  variable  in  the 
original  specification  now  omitted. 

2.  The  coefficients  of  the  included  dummy  variables  in  the  second 
specification  are  the  differences  of  their  original  coefficients  and  that 
of  the  omitted  dummy  variable. 

3.  The  coefficients  of  other  non-dummy  regressors,  if  any,  remain 
unchanged. 

4.  Both  estimated  functions  will  give  the  same  estimated  values  of 

2 
the  regressand  and  therefore  R  remains  the  same. 

General  Case  and  Application  of  Principal  Components  Analysis 

If  more  than  one  set  of  dummy  variables  is  included  in  the  equation, 
multicol linearity  exists  even  without  jsing  the  constant  term.  Any  non- 
square  transformation  may  be  used  to  exclude  any  one  dummy  variable  from 
each  set  and  include  the  constant  term.   As  a  result,  one  cannot  find  a 
unique  specification  as  in  the  prior  simple  case,  as  a  basis  for  finding 
the  functional  relationship  between  the  estimated  coefficients  in  all  the 

possible  transformed  equations.   Even  though  all  these  estimated  functions 

2 
will  give  the  same  estimated  dependent  variable  and  the  same  R  ,  question 

arises  as  to  the  relative  importance  of  all  the  dummy  variables  in  a  set. 

This  problem,  well  known  in  quantitative  psychology,  can  be  approached  by 

principal  components  analysis. 
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Principal  components  analysis  is  a  technique  to  find  a  smaller  set 
of  variables,  the  principal  components,  in  a  linear  function  to  explain  each 
of  the  original  set  of  variables.  Especially  where  near-degeneracy  exists 
in  the  original  data,  replacing  them  with  the  principal  components  results 
in  condensation  of  information.   Finding  the  principal  components  may  be 

an  end  in  itself,  and  it  may  be  used  as  a  first-stage  solution  to  factor 

2 
analysis  or  a  preliminary  to  regression  analysis  which  is  the  case  in  this 

study. 

Since  multicol linearity  exists  among  sets  of  dummy  variables  and  the 
constant  term,  principal  components  technique  enables  us  to  extract  from 
them  a  smaller  set  of  independent  variables  that  reproduce  all  the  data 
variation  in  them.   This  new  set  of  independent  variables,  the  principal 
components,  can  then  be  used  as  regressors  instead  of  the  original  for  ex- 
plaining the  dependent  variable.   It  is  shown  in  Appendix  II  that  the  coef- 
ficients or  all  the  original  regressors  can  be  obtained  by  a  linear  trans- 
formation of  the  estimated  coefficients  of  the  principal  components.   Thus, 
an  assessment  of  the  relative  importance  of  all  the  original  regressors  is 
permitted.   Furthermore,  for  prediction  it  is  not  necessary  to  convert  the 
original  regressors  into  principal  components. 

Examples 

The  prior  theoretical  material  can  be  best  illustrated  by  an  example. 
The  data  are  based  on  a  subsample  of  38  families  that  purchased  automobiles 


2 
Tatsuoka,  Maurice  M.  Multivariate  Analysis  CNew  York:  John  Wiley 

and  Sons,  Inc.,  1971),  pp.  144-149. 
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in  1969,  part  of  a  panel  of  young  couples  maintained  by  SRL  in  Peoria  and 
Decatur,  Illinois.  The  selected  variables  include  the  dependent  variable 
y,  independent  variables  x  ,  x_,  and  a  set  of  dummy  variables  d. ,  d_,  and 
d_,  where 

y  =  price  of  automobile  in  hundreds  of  dollars 

x  =  husband's  education  in  number  of  years 

x2  =  1969  family  income  in  thousands  of  dollars 

d.  =  husband  assumes  the  role  of  financial  officer 

d-  =  wife  assumes  the  role  of  financial  officer 

d_  =  husband  and  wife  jointly  assume  the  role  of  financial  officer 

Single  equation  least-squares  is  used  to  estimate  the  coefficients  in 
various  specifications  of  the  function  to  explain  the  price  paid  for  an 
automobile.  The  regression  results  are  summarized  in  Table  1,  each  column 

depicting  a  separate  function. 

2 
Note  that  R  and  the  estimated  coefficients  of  x.  and  x~  are  the  same 

in  functions  (3)  through  (7) .   In  function  (7)  the  coefficients  of  all  the 
regressors  are  obtained  by  a  linear  transformation  of  the  estimated  coef- 
ficients of  the  principal  components. 

It  is  evident  that  the  constant  terms  in  function  (4) ,  (5) ,  and  (6) 
are  respectively  the  estimated  coefficients  of  the  dummy  variables  in  (3) 
but  omitted  in  (4),  (5)  and  (6).   Furthermore,  the  coefficients  of  the  two 
included  dummy  variables  respectively  in  (4) ,  (5) ,  and  (6)  are  the  differ- 
ences between  the  estimated  coefficients  of  the  corresponding  dummy  vari- 
ables in  (3)  and  the  constant  terms  in  (4) ,  (5)  and  (6) . 
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Comparing  the  RSS  (regression  sum  of  squares)  of  function  (1)  with  those 
of  (3) ,  (4) ,  (5)  and  (6) ,  it  is  noted  that  the  additional  contribution  of 
all  three  dummy  variables  in  (3)  is  the  same  as  that  of  any  two  dummy  vari- 
ables and  the  constant  term  in  (4) ,  (5)  and  (6).  Similarly,  comparing  the 
RSS  of  function  (2)  with  those  of  (4) ,  (5)  and  (6) ,  it  is  also  evident  that 
the  additional  contribution  of  any  two  dummy  variables  in  (4) ,  (5)  and  (6) 
is  the  same. 

It  is  of  interest  to  observe  that  the  coefficients  of  d_  in  functions 
(5)  and  (6)  bear  different  signs.  Therefore,  in  interpreting  the  regres- 
sion results  of  any  of  the  functions  (4),  (5)  and  (6),  the  researcher  should 
be  aware  that  the  positive  or  negative  contribution  of  an  included  dummy 
variable  is  only  relative  to  the  omitted  dummy  variable  in  the  set. 

In  empirical  studies,  it  is  a  common  practice  to  select  only  variables 
with  higher  t-ratios  in  the  initial  regression  results  for  a  second  regres- 
sion fit,  with  a  view  to  gaining  more  degrees  of  freedom  with  little  sacri- 
fice in  goodness  of  fit.  However,  one  should  be  cautious  in  applying  this 
rule  to  dummy  variables.  A  dummy  varit'ile  dropped  from  the  function  is 
actually  combined  with  the  dummy  variable  initially  excluded  from  the  func- 
tion, thereby  forming  a  smaller  set  of  dummy  variables.  This  new  set  of 
dummy  variables  may  or  may  not  be  the  optimal  set  among  all  the  alternative 
sets  for  the  second  fit  in  terms  of  goodness  of  fit  and  mean  squared  error, 
since  the  initial  function  is  not  unique. 

For  example,  suppose  out  of  alternatives  (4),  (5)  and  (6),  function 
(5)  was  chosen  to  give  the  initial  regression  results,  and  dummy  variable 

d,  with  a  t-ratio  of  -0.59  is  dropped  for  the  second  run  in  (8).  The  re- 

2      2 
suits  of  (8)  show  that  R  and  s   (mean  squared  error)  are  not  as  good  as 
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those  of  C9)  in  which  dummy  variable  cL  is  dropped.  On  the  other  hand, 
suppose  function  C4)  was  chosen  to  give  the  initial  results,  the  t-ratio 
rule  would  lead  to  omission  of  cL  and  the  optimal  second  fit  in  (9) •   In 
other  words,  an  included  dummy  variable  in  the  initial  function  with  a 
higher  t-ratio  may  have  a  lower  t-ratio  when  a  different  dummy  variable 
is  excluded  from  an  alternative  initial  function.  For  example,  compare  the 
t-ratios  of  d,  in  (4)  and  (5) . 

Therefore,  some  other  criterion  is  needed  for  combining  the  dummy  vari- 
ables in  order  to  gain  more  degrees  of  freedom  and  reduce  mean  squared  error 
in  the  second  fit.  It  seems  the  results  given  by  the  principal  components 
approach  in  (7)  might  throw  light  upon  achieving  the  purpose.  Since  the 
coefficients  of  all  the  dummy  variables  are  obtained,  comparison  among  them 
may  be  helpful.   It  is  observed  that  the  coefficients  of  d..  and  d_  in  (7) 
has  the  smallest  difference  among  all  possible  three  pairs  of  coefficients 
in  the  set.   It  is  felt  that  combining  these  two  dummy  variables  would  con- 
stitute the  set  that  can  best  serve  the  purpose.   This  hypothesis  is  borne 
out  by  the  results  in  (9).   However,  no  mathematical  proof  is  attempted 
in  this  paper. 

To  illustrate  the  genera]  case  of  two  or  more  sets  of  dummy  variables, 
a  set  of  two  dummy  variables  e.  arid  e„  is  added  to  the  previous  model  of 
automobile  price,  where 

e.  =  wife  works 
e~  =  wife  does  not  work 
Again,  least-squares  estimates  are  run  for  various  specifications  of  the 
function,  and  are  displayed  in  Table  2. 
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Note  that  all  nine  of  these  estimated  functions  in  the  first  nine  columns 

2 
yield  the  same  R  as  well  as  the  same  coefficients  of  x.  and  x_.   In  function 

(9) ,  the  principal  components  approach  is  applied  to  obtain  the  coefficients 

of  all  the  regressors. 

When  the  constant  term  is  omitted  in  the  function,  only  one  dummy  vari- 
able has  to  be  excluded  from  either  set.  The  results  of  the  five  possible 
choices  are  shown  in  columns  (1)  through  (5) .   It  is  interesting  to  note 
that  the  estimated  coefficients  of  e.  and  e_  in  (3)  are  those  of  d_  respec- 
tively in  CI)  and  (2).  When  the  constant  term  is  specified  in  the  function, 
one  dummy  variable  has  to  be  excluded  from  each  set.  Only  three  of  the  pos- 
sible six  functions  are  shown  in  column  (6) ,  (7)  and  (8) .  Note  that  the 
constant  term  in  (6)  is  the  estimated  coefficient  of  d_  in  (1) ,  and  that 
the  estimated  coefficients  of  d,  and  d„  in  (6)  are  respectively  the  dif- 
ferences between  those  in  (1)  and  the  constant  term  in  (6). 

Similar  to  the  case  of  using  only  one  set  of  dummy  variables,  the  re- 
sults of  (9)  can  throw  light  upon  the  choice  of  combining  dummy  variables 
in  a  set  into  a  new  smaller  set.  Note  that  the  difference  between  the  es- 
timated coefficients  of  d,  and  d„  in  (9)  is  the  smallest  among  the  possible 

three  pairs.   The  results  of  (11)  support  the  theory  that  combining  d  and 

2 
d_  yields  the  best  R  among  all  the  three  possible  choices  of  combination. 

Summary 

This  paper  is  an  attempt  to  synthesize  some  known  facts  about  dummy 
variables  in  regression  analysis.  Some  examples  are  used  to  display  the 
behavior  of  the  estimated  coefficients  of  the  dummy  variables  in  alternative 
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specifications.   It  is  pointed  out  that  using  the  usual  criterion  of  t-ratios 
to  delete  dummy  variables  may  not  bring  about  the  best  goodness  of  fit  unless 
the  initial  specification  is  the  lucky  one  among  the  alternatives.  In  ad- 
dition, the  well-known  technique  of  principal  components  analysis  is  applied 
to  the  problem,  to  circumvent  the  non-uniqueness  of  specification  in  using 
sets  of  dummy  variables.   It  is  conjectured  that  the  coefficients  thus  ob- 
tained for  all  the  dummy  variables  may  be  useful  for  the  best  choice  of 
combining  dummy  variables  in  terms  of  goodness  of  fit. 
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Appendix  I 


Assume  that  there  are  N  observations  in  k  independent  variables  x1 ,  x9,..., 
x,  and  without  loss  of  generality,  a  set  of  three  dummy  variables  d1 ,  d„,  and  d_. 
To  avoid  multicol linearity  and  form  a  unique  basis  for  later  comparisons,  the 
constant  is  excluded  from  the  initial  specification.  Using  matrix  notation, 
the  N  equations  are  written 
Y  =  x  6  +  u 


=  [dv  d2,  d3,  jy 


& 


+  u 


where   x  is  the  N  x  (k+3)  total  regressor  matrix  of 

N  observations  on  d.,  d„,  d_,  x..,  x?,...,  x, 

Y  is  the  N  x  1  vector  of  observations  on  the  regressand 

D, ,  D.  and  D_  are  respectively  the  N  x  1  vector  of  observations  on  d1 ,  d_  and  d_ 

X,  is  the  N  x  k  matrix  of  observations  on  the  k  non-dummy  regressors  x1 ,  x_,...x, 

6    6,  and  6,  are  respectively  the  coefficients  of  d, ,  d_  and  d_, 
dj   d2      d3  12      3 

6  is  the  k  x  1  vector  of  coefficients  of  the  x's,  and 
x 

2 

u  is  the  N  x  1  vector  of  disturbances,  with  E(u)  =  0  and  Var  (u)  =  a   I. 

The  least-squares  estimate  of  the  coefficients  is  given  by 


6 


!d. 


=  (X  X)"1  X  Y 


/vd, 

3  ' 
_x 

Next,  let  the  specification  be  changed  by  arbitrarily  omitting  dummy  variable  d.. 
and  substituting  a  constant  terra  c.  The  new  regressor  matrix  can  easily  be  obtained 
by  applying  a  (k+3)  x  (k+3)  non-singular  square  transformation  matrix  to  the 
original  total  regressor  matrix  X,  i.e. 
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Z  =  XT 


where 


T  = 


— 

1 
1 
1 

0 

1 

0 

0 
0 

1 

_!._. 

0 

0 

\ 

and  where  I,  is  a  k  x  k  identity  matrix.  In  this  new  regression  func- 
tion Y  »  Z3  +  u,  the  least-squares  estimate  of  the  coefficients  for  the 
transformed  regressor  Z  is 

6  =  (Z'Z)"1  Z'Y 

=  (T'X'XT)"1  T'X'Y 

=  t"1  cx'x)-1  t'^t'x'y 

=  T'1  (X'X)"1  X'Y 
=  T  L   g 


Since  the  inverse  of  T  is  found  as 
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substituting  T   and  6  into  the  equation  for  0  gives  the  relationship  between 
the  components  of  6  and  6 


8  = 


Be 
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Finally,  the  estimated  values  of  the  regressand  vd.ll  be  affected  by  the 


change  in  specification  as  shown 
Y  =  Eg  =  XTT  B  =  XB  «  Y 
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Appendix  II 
Let  X  be  the  raw  data  regressor  matrix  of  N  observations  on  (k  +  p  +  1) 
regressors,  i.e.,  the  constant  term,  k  non-dummy  independent  variables,  and 
p  dummy  variables  in  m  sets.  Since  there  is  linear  dependency  among  the  dummy 
variables  and  the  constant  term,  the  rank  of  X  is  (k  +  p  +  1  -  m) .  Therefore, 
it  is  possible  to  find  a  set  of  (k  +  p  +  1  -  m)  variables,  smaller  than  the 
original  set  of  (k  +  p  +  1)  variables,  that  reproduce  all  the  data  variation 
in  X.  This  new  set  of  variables  Z,  or  principal  components,  can  then  be 
used  as  regressors  in  place  of  X  for  explaining  Y. 

Using  principal  components  analysis ,  one  can  find  a  transformation 
V*  such  that  1   =  XV*,  by  diagonizing  X"X  which  is  a  (k  +  p  +  1)  x  (k  +  k  +  1) 
symmetric  matrix.  That  is, 

X'X  =  VAV 
where  A  is  a  Ck  +  p  +  1)  x  Ck  +  p  +  1)  diagonal  matrix  of  eigen  values  of  X'X 

V  is  a  (k  +  p  +  1)  x  (k  +  p  +  1)  matrix  of  corresponding  eigen 

vectors  and  V*V  =  I 
Since  the  rank  of  X'X  is  (k  +  p  +  1  -  m) ,  one  can  only  find  (k  +  p  +  1  -  m) 
positive  eigen  values  in  A,  i.e.,  the  last  m  eigen  values  on  the  diagonal 
of  A  are  zeroes.  Thus,  a  set  linearly  independent  variables  Z  can  be  found 
by  using  the  first  (k  +  p  +  1  -  m)  eigen  vectors  in  V  as  a  transformation  V*. 
The  principal  components  are 

2  =  XV, 
where    Z  is  N  x  (k  +  p  +  1  -  m) 

X  is  N  x  (k  +  p  +  1) 

V,  is  (k  +  p  +  1)  x  (k  +  p  +  1  -  m) 
One  can  then  estimate  the  parameters  for  the  principal  components  in 
the  following  model 
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y  =  ze  +  v 

where    Y  is  the  N  x  1  vector  of  regressand 

Z  is  the  Nx(k  +  p+l-m)  matrix  of  derived  principal  components 

3  is  the  (k  +  p  +  1  -  m)  xl vector  of  coefficients  of  the  principal 
components,  and 

2 
u  is  the  N  x  1  vector  of  disturbances  with  E(u)  =  0  and  Var  (u)  =  a  I, 

Applying  the  least  squares  method  to  the  above  model  gives  the  following 

two  results : 

1.  The  estimate  of  B  is 

z 

b  =  cz'z)"1  z'y  =  (v;  x"x  v*)"1  v*'x"y 
=  (v^vavX)-1  v;x'y 

=  A*'1  V*'X'Y 
where  A  is  the  (k  +  p  +  l-m)x(k  +  p+l-nO  diagonal  matrix  of  the 
first  (k  +•  p  +  1  -  m)  positive  eigen  values  in  A. 

2.  The  estimated  values  of  the  regressand  are 
Y  =  Z  &     =  CXV*)  Ba 

B  B 

=  x(vjg) 

=  XB 
x 

where  £  is  defined  to  be  V*B  or  V*(A*  V/X'Y) 

X  b 

It  is  evident  that  the  coefficients  of  all  the  original  regressors  can  be 
obtained  by  a  linear  transformation  of  3  .  Furthermore,  in  predicting  Y, 

B 

it  is  not  necessary  to  convert  an  observation's  X  scores  into  Z  scores. 

Finally,  it  should  be  noted  that  the  expected  value  of  B   is  only  a 

linear  transformation  of  the  parameters  of  the  principal  components.  That  is, 
Bx  =  vja  =  v#  [Ba  ♦  CZ'zr1  Z'u] 


2 

ECBX)  =  V,Ba 


VJ    *   v^cz^z)"1  Z'u 
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A 

The  variance  of  $     is  given  by 

Var   (Sx)   =  E   [S  -  EOy]      [B   -  WJV 

=  E   iV^CZ'zr1  Z'yy'ZCZ'Z)^1  V/J 
=  <T2V*  (Z^Z)"1  V** 

2  2        fY-Y*)    fY-Y1 " 

where  the  variance  of  the  disturbance  a     is  estimated  by  S     =       /.  ■    \ — £— 

J  n-rLK+p+1-ni) 


w 


